Method for focusing a camera

ABSTRACT

Aspects of the present disclosure are directed to a method for focusing a camera. In one embodiment, the method includes: dividing the field of view of the camera in to at least two segments: assigning, in each case, at least one operating element or at least one position of an operating element to the at least two segments; recognizing and tracking at least one object in at least two segments; automatically assigning the recognized at least two objects to the respective operating element or position of the operating element depending on which segment the objects are assigned to; and focusing the camera on the object assigned to the operating element or the position of the operating element in response to the operating element being actuated or the operating element being brought into the corresponding position.

Focusing, i.e. setting the focus area of a camera to a specific objector point in the field of view, is essential both for photography and forcapturing moving images. In the case of moving images, i.e. filmrecordings, for example, it is a series of shots which are taken oneafter the other. In this way, the object or point is displayed sharplyand clearly for the viewer and his attention is focused on it. For thispurpose, objects are recognized by recognition algorithms or by manualassignment, the camera is focused on them and possibly also tracked, andthe focus is readjusted accordingly. In particular, if there is morethan one object in the camera's field of view, it can be useful to focuson one object for a while and then shift the focus to the other object.

WO 2017/184056 A1 describes a method for shifting the focus from a firstobject to a second object by using an operating element. For thispurpose, these objects must first be defined by the user of the cameraand then each assigned to an operating element, so that the user knowswhich object will be focused when the operating element is actuated.This is time-consuming and laborious for the user, and it also reducesconcentration during filming. This is a particular problem when shootingmovies that contain a large number of different scenes, where a newobject recognition and assignment must be performed each time. Thisslows down the use of the camera, since the objects have to be markedfirst, and decreases the quality of the recording, since the user ismainly occupied with setting and assigning the objects and cannotconcentrate on the filming itself. The user of the camera must performthe assignment before starting the recording to ensure an image focusedon an object right from the start.

The object is thus to provide a method for operating a camera thatenables reliable focusing.

This object is solved according to the invention in that the field ofview of the camera is divided into at least two segments, and wherein atleast one operating element or at least one position of an operatingelement is in each case assigned to at least the two segments and atleast one object is in each case recognized and tracked in at least twosegments, and the recognized at least two objects are automaticallyassigned to the respective operating element or position of theoperating element depending on which segment they are assigned to, andwhen the operating element is actuated or the operating element isbrought into the corresponding position, the camera is focused on theobject assigned to the operating element or the position of theoperating element.

It is also solved in such a way that the camera visually recognizes andtracks at least one object in the field of view of the camera, thatdepth data is detected in at least a portion of the field of view of thecamera and is assigned to at least a portion of the image components ofthe field of view, and that at least a portion of the depth data areused for recognizing and tracking the at least one object.

Field of view or viewing area means the area which the camera captures,i.e. the area in its angle of view, which is depicted in the form of atwo-dimensional or three-dimensional image.

Segment thus means a part of the field of view. If no depth data of therecorded area is captured, i.e. if the recorded image istwo-dimensional, then the segments are preferably also two-dimensional.In the case of two-dimensional segments, they have an extension alongthe x-axis and the y-axis, i.e. along the side extension and heightextension of the image, and are bounded by a side margin or sidemargins. If depth data are acquired and assigned to the individualpoints of the field of view, the segments can also be three-dimensional.

Object thus means an image component that is of particular interest,such as a group of people, a single person, a face, an eye, or even avehicle, a bottle, or any other object or part of an object. The objectcan be recognized automatically, i.e. based on an algorithm, or bymarking by the user.

Object tracking means in this case that an object that has beenidentified in one recording of the camera is searched for in subsequentimages and is identified again there. This is particularly important formoving objects or when the camera is moving. The tracking of objects isalso referred to as object tracking or tracking. Object tracking issubject to methods of computer vision or machine vision.

When dividing the field of view into segments, areas of the field ofview can also not be assigned to any segment. In other words, at leasttwo segments can be defined in the field of view. It can also beprovided that these segments preferably do not overlap.

By automatically assigning the object depending on which segment it isin, i.e. depending on its position in the field of view, a clear andunambiguous automatic assignment is made for the user, which means thatthey no longer have to make the assignment themselves. This provides theuser with intuitive, simple and fast operability. For example, a firstsegment can cover the left half and a second segment the right half ofthe camera's field of view, and a first face recognized in the firstsegment can be assigned to a first operating element and a second facerecognized in the second segment can be assigned to a second operatingelement. The user thus already knows at the start of the recording whichobject will be focused on when he operates one or the other operatingelement. Of course, a large number of segments can be provided, whichenables assignment to a large number of operating elements.

It can be provided that the camera automatically focuses on the firstobject in a certain segment when it is recognized. This is particularlyadvantageous at the beginning of the film, when no targeted focusing hasyet been performed, since the first object is thus focused at the momentit is recognized and the image thus has a meaningful focus right at thebeginning. In this case, a specific segment can be determined and thefocus can be directed to the first object that is recognized or assignedin this segment. Alternatively, it can be provided that the focus isdirected to that object which is recognized or assigned first in anysegment. If additional objects are identified or defined after the firstobject, the focus may remain on the first object until the actuation ofan operating element or other action or trigger causes the focus tochange.

It can be provided that automatic object recognition is carried out inthe entire field of view or at least part of the field of view,independently of the segments. This facilitates their assignment whenthey move into a segment.

The segments and also recognized or tracked objects can be displayed ona screen for the user. This allows the user to see where the segmentsare arranged, what shape they have and which objects are recognized andtracked and/or which object has been assigned to which operatingelement.

It can be provided that the segments can be defined by the user in thefield of view in a setting step, preferably before the start of therecording. Various segmentations can also be suggested to the user forselection.

Preferably, each segment is assigned exactly one operating element orthe position of an operating element. However, it can also be providedto assign several operating elements to at least one segment. Thisallows several objects identified in a segment to be assigned to thevarious operating elements of the segment in a defined manner. This canbe carried out, for example, from left to right, in the field of view,from front to back, or in the order of identification or assignment.

It can be provided that the first object recognized in a segment isassigned to an operating element or position of an operating element,and the next object in this segment is assigned to another operatingelement or position of an operating element assigned to it. This can becarried out until all operating elements are assigned to objects in thesegment. If further objects are identified in the segment afterwards, itcan be provided that these are not assigned to any operating element orposition of an operating element. It is immaterial whether the object islocated in the segment at the moment of the first recognition or hasalready been recognized in previous shots and enters the segment bytracking through the movement of the object.

It is particularly advantageous if this automatic assignment of at leasttwo objects to a respective operating element or position of anoperating element is maintained when at least one object moves toanother segment to which it is not assigned. In other words, in a firstor initiating step, the described assignment of the objects to theoperating element or position of the operating element can be made andthen maintained in further sequence. This assignment can be made by aspecific event, such as actuating a set operating element, turning thecamera off and on again, the object leaving the viewing area, preferablyover a certain minimum time, or identifying new objects. However, thisassignment can also occur when the object itself appears in a certainsector, or is identified in a certain sub-sector of the sector. In thisway, the assignment is prevented from being changed during the recordingby the movement of the objects in the field of view. This is becauseafter the first step of automatic assignment, the user—who knows or canalso be shown where the segments are set—is aware of which object isassigned to which operating element and may wish to maintain thisassignment, regardless of the current position of the objects in thefield of view.

The operating elements can be designed in different ways, but it isessential that they have at least two defined positions that they canassume, for example like buttons or keys. In this sense, it isadvantageous if at least one operating element is a knob, rotary knob,joystick or slider with at least two positions. A slider means anoperating element that can be brought into at least two definedpositions by moving or turning it along at least one direction. Thus, bymoving it from one position to the other, a transition of the focus canbe controlled, preferably by the user. Thereby, preferably the speed ofthe transition of the focusing is dependent on the speed of movement ofthe slider.

The operating elements can also be part of a graphical user interface.The buttons, the slider or rotary control, a joystick, etc. can thus beelements recorded on a screen, which are moved or triggered viatouchscreen, mouse or similar controls, for example.

If the segments are assigned to different positions of an operatingelement, it is advantageous if moving the operating element in thedirection of at least one position shifts the focal plane in space inthe direction of the object assigned to the position. If the segmentsare assigned to different operating elements, it is advantageous ifactuating an operating element automatically shifts the focus to theplane of the object assigned to this operating element.

Preferably, the objects are recognized automatically. This means that acomputing unit of the camera, the camera system or the 3D sensorrecognizes the objects without the user having to mark and thus definethe objects individually in the field of view.

In this case, a certain type of object can be preset for one or moresegments. For example, it can be provided that the user can set for eachsegment whether an occurring object can be a face, a car and/or abottle.

In this sense, it may be provided that object recognition includesfeature analysis in which visual features of an object in an area aresearched, analyzed, and identified, and the object is recognized basedon these features and their movement is tracked. In other words, featuretracking is performed, i.e. features of a certain type are searched forin the field of view or part of the field of view and an object isrecognized based on these features. These features are visual featuresin the image and can be, for example, certain colors, edges, contours,textures, contrasts, or characteristic combinations of shapes and colorssuch as a face or eyes. Based on these features or the combination ofsuch features, possibly also depending on their respective positions(for example a defined distance between two eyes and a nose to recognizea face), an object can be recognized and distinguished from the rest ofthe image, as well as tracked. In this context, the area can be thesegment, another part of the field of view, or the entire field of view.

It can also be advantageous if the object recognition includesrecognition via a deep-learning algorithm. Deep-learning algorithmrefers to the processing of recorded image data by a multi-layeredneural network that has been trained in advance using a large number ofsample data sets. For example, the deep-learning algorithm can betrained to recognize persons or exactly this particular person byinputting very many images of persons or a particular person fromdifferent perspectives, with different facial expressions, differentexposures and the like. These learned objects are available in networksand are used by the algorithm to identify objects in the images. Byprocessing recorded data, features and/or even objects can be recognizeddirectly.

It is very advantageous if already trained data sets or networks can beused. For example, trained data sets are available for the human body,the face, or the limbs (e.g., in the skeleton tracking method). Thismeans that with the help of existing deep-learning algorithms, people aswell as objects can be automatically recognized and tracked in an imagein a very reliable and robust manner.

It can further be provided that at least one object is marked by theuser, the algorithm of a computing unit of the camera, the camera systemor the 3D sensor recognizes the features of this object and the objectis tracked on the basis of its features. From image to image, thefeatures change only slightly even if the object moves, and they canthus be identified in the subsequent image. Through such featuredetection, an object can be easily defined and tracked. This isespecially useful for objects that are not usually tracked or that aredifficult or poorly recognized by a deep-learning algorithm or featuretracking.

The object recognition and tracking methods described can, of course, beused in any combination, thus reducing their drawbacks and/orinaccuracies.

In general, cameras record a two-dimensional field of view, with anx-axis and a y-axis spanning the two-dimensional field of view.

It is particularly advantageous if depth data is recorded in at leastpart of the camera's field of view and at least some of the imagecomponents of the field of view are assigned to this depth data, andbefore the assignment preferably another camera records a real image andat least some of the image components of the real image are assigned tothe depth data. In this case, depth data means distance data ofindividual pixels or image components to the camera. In other words,this is distance data along a z-axis which is normal to a plane spannedby an x-axis and a y-axis. Depth data can be assigned to each imagecomponent, for example, to each pixel. By assigning depth data, moreinformation about the image is available, which can facilitateprocessing of the image. For example, it is particularly advantageous ifdepth data of the objects is used to adjust the focus. This can speed upand improve the focusing process.

It is also advantageous if the depth data of the objects are permanentlyrecorded. This allows automated and fast refocusing from one object toanother, since both distances are permanently known and do not have tobe determined during the focusing process (as is usually the case). Themovement of the operating element preferably determines the duration ofthe focusing process. This results in a uniform focus ramp in space.

It is particularly advantageous if the segments are at least partiallydelimited not only by side edges but also by depth edges. This delimitsthe segments three-dimensionally. Segments can thus also be arranged onebehind the other along the z axis. This allows even better segmentationand more design freedom in focusing for the user. Side edges mean edgesof two-dimensional field of view, i.e. edges that span between thex-axis and y-axis.

Furthermore, it can be provided that the acquisition of the depth datais at least partially performed via at least one 3D sensor, which ispreferably attached to the camera.

A calculation unit can use a 3D sensor to generate a matrix of distancevalues. For example, the 3D sensor can consist of a stereoscopic cameraarray, TOF camera, laser scanner, lidar sensor, radar sensor, orcombination of different 3D sensors to improve measurement quality,range, and resolution.

Preferably, it is provided that another camera takes a real image and atleast a part of the image components of the field of view are assignedto the depth data. Thus, at least a part of the pixels of the real imageis assigned to at least a part of the respective depth data, whereby thedistance of this pixel to the camera is known. For this purpose, the 3Dsensor preferably has an additional camera which generates a real imageand which is therefore referred to here as a real image camera. The 3Dsensor and the real image camera are preferably mechanically fixed toeach other and calibrated. The display perspectives are preferably thesame. Thus, a distance value can be assigned to each recognizable pixelof the real image camera. This assignment is called depth image.Preferably, this real-image camera has an infinitely largedepth-of-field range in order to be able to sharply depict all objectsin the image. Preferably, this real-image camera has a large exposurerange (e.g., through an HDR mode) in order to be able to image subjectsof varying brightness uniformly. Alternatively, the image of the cameracan be used to create the depth image, in other words, the camera canrepresent the further camera.

It is advantageous if the acquisition of depth data includes thetriangulation of data from at least two auxiliary cameras. In this case,the camera itself can also be one of the auxiliary cameras. Theauxiliary cameras are preferably arranged at a defined distance and arepositioned at a defined angle to each other so that triangulation iseasily possible.

In a preferred orientation, the optical axis of the 3D sensor isarranged parallel to the optical axis of the camera to achieve anorthogonal measurement of depth data to the optical axis of the camera.

It is also particularly advantageous if the depth image of the 3D sensoris at least partially calculated into the image of the camera. Thismeans that the perspectives of the images can be merged.

The spatial assignment of a part of the camera's field of view can alsobe performed by position sensors attached to the subject. Such positionsensors can be, for example, microwave, radar or sound sensors ortransponders, which enable distance and angle determination in relationto a base station using various physical methods, such as runtimemeasurement, phase measurement or field strength measurement. If thebase station is attached to the film camera, such sensors can be used todetermine the spatial position of the subject in space.

All of these methods can be used to determine the distance between thecamera and a portion of the field of view, thus assigning a distance toeach portion of the field of view.

If it is intended to visually track an object in a video image, one isconfronted with the fact that the object can often disappear from thecamera's field of view. Simple tracking algorithms, such as eye, fieldof view or person tracking, often have the following disadvantages:

-   -   The object to be tracked is partially or completely covered by        another subject or by other objects for a short time or longer.    -   The object to be tracked leaves the field of view of the camera        and comes back into view after a short or longer period of time.    -   The object to be tracked rotates, or the film camera moves        around the object, changing the features of the object to a very        high degree. For example, eyes or the face or the certain        features are no longer visible in the image. Here the tracking        algorithm may lose its target object.    -   A feature of an object is covered, or is no longer visible in        the real image. If there is another object with the same or very        similar features in the camera's field of view, the tracking        algorithm must not jump to the similar object. Here, the        tracking algorithm must be prevented from jumping to a subject        with similar features if the features of the original object are        no longer visible.

In general, the following disadvantages arise if the real image of thefilm camera itself were used to track an image area:

-   -   Film lenses have a small depth-of-field range. They also display        desired areas in the image very blurred. However, if image areas        are out of focus, tracking cannot be performed in these areas        because objects in the video image cannot be recognized and        analyzed. If it is intended to move the focal plane from area A        to area B, this may not be possible because area B is not        recognized because it is blurred in the real image of the film        camera. An object selection in the real image of the film camera        can be problematic or even impossible.    -   Some areas in the real image of the film camera are exposed in        such a way that no image analysis can take place in these areas.

Thus, it is particularly advantageous if a tracking algorithm for movingobjects in front of the camera is built to be very robust againstvarious disturbances.

The invention also relates to a method for tracking objects in video orfilm cameras. The main feature is a support or fusion of differenttracking algorithms, in particular the inclusion of depth data.

According to the invention, the tracking process consists of thefollowing steps:

-   -   Image capture:

A real image and a depth image are generated in a 3D sensor.

-   -   Transformation/Rectification:    -   If the real image and the depth image are generated by 2        different sensors or by the camera itself, they must be brought        into agreement. This means that a pixel of the real image should        be assigned the corresponding depth value. This is carried out        by image transformation and rectification based on the        parameters of the sensors, the optics and the geometrical        arrangement to each other.    -   Segmentation    -   Contiguous areas can be extracted from the depth image.        Individual objects/subjects in the room are grouped together as        segments based on their common (similar distance). Layers (such        as floor or walls) can be combined into segments. This allows        interesting subjects to be delimited from uninteresting        subjects.    -   Prediction    -   A computational prediction of the motion information is made        based on the previous images and on the physical laws of the        objects to be tracked (objects can only move by a speed of vmax        in relation to the video camera)    -   The prediction results in a possible distance range L_(XYZ)(n),        in which the object can be located. For very short scanning        intervals, the distance range is very small and the prediction        accuracy of where the object is located is very high.    -   Region determination    -   From the prediction of the distance all image areas of the real        image can be excluded, which lie outside the distance range        L_(XYZ)(n). This means that in the real image all image areas        can be faded out which definitely do not lie in the area of the        prediction.    -   From the segmentation of the depth image, regions can be        extracted from the real image, on which the tracker can fix. For        example, floors, walls or the background can be hidden in the        image region. This allows the interesting areas of the real        image to be restricted again.    -   With both measures, the image area of the real image is very        limited where to look for the object. The tracking algorithm        becomes very robust due to the prediction and segmentation by        the depth image.    -   The transformation/rectification (scale-accurate,        distortion-corrected and correctly aligned image) of the depth        image into the real image is a condition of this region        determination.    -   Object tracking/Tracking    -   The camera image pre-processed in this way can now be fed to the        object tracking. Possible tracking algorithms are, for example,        deep-learning algorithms or feature detection.    -   Tracking can be carried out based on different features, such as        person, limbs (skeleton tone), face, -head/shoulder-, eyes.

It is also particularly advantageous if at least part of the depth datais used to recognize and track at least one object. The differentiationof objects and/or their delineation from the environment can beimproved. This speeds up object recognition and reduces the error rate,thereby leading to improved usability. In particular, this depth datacan be advantageous during tracking, for example, when an objecttemporarily leaves the field of view.

Visual recognition and tracking means that recognition and tracking arebased at least in part on visual data, i.e. on image data. The depthdata can be included as supporting additional information.

It can be provided that the depth data is at least partially recorded bya 3D sensor on the camera. This provides a simple means of recording thedepth data. Recording in the area of the camera is particularlyadvantageous, as this makes it easier to link the depth data with theimage data of the camera. Preferably, the 3D sensor is brought into adefined position and distance from the camera.

Furthermore, it is advantageous if at least one segment is defined inthe field of view of the camera and is focused on a recognized andtracked object as soon as it is located in the segment. Thisautomatically sets the focus on an object that is highly likely to beimportant. It can be provided in this case that only a defined,recognized and tracked object or object of a defined object class isfocused on as soon as it is located in the segment.

In a preferred embodiment, it is provided that at least one object isassigned a maximum range of motion per unit of time and this range ofmotion is included in object recognition and object tracking. Thisallows objects to be tracked better and more robustly, and to berecognized even if they are temporarily obscured. Physically, it isimpossible for objects to suddenly appear in a different sector whenmoving because they are subject to maximum velocity or acceleration.Thus, if an object is covered by another object while tracking in thevideo image, and this covering object is similar to the original object,the tracking algorithm may jump to the new object. Physically, however,such a jump cannot usually be possible if the position jump is above amaximum possible acceleration of the motion vector in space.

The maximum range of motion per time, for example the maximum range ofmotion per single shot, makes it easier to find the object in the fieldof view from shot to shot, since the range of motion limits the area inwhich the object can be located. However, this margin does not have tobe an absolute constraint, but can serve as a weighted single aspectamong multiple aspects (such as feature tracking) to identify and trackthe object. In other words, the tracking is stabilized by the spatialsegmentation in x,y- and/or z-direction by a maximum possible movementrange. Thus, the primary tracking algorithm is supported by a spatialcomponent.

It is also possible to measure the speed, direction of movement and/oracceleration of an object and include this in the object tracking. Thismakes it easier, for example, to re-recognize an object that is movingat a known speed in the field of view and temporarily disappears behindan obstacle after it emerges on the other side of the obstacle. It isparticularly advantageous if the original assignment to the operatingelement or position of the operating element is maintained.

With the introduction of a motion range and the spatial subdivision ofthe camera's field of view into segments, it is possible to a certainextent to automatically detect and track objects that leave the field ofview again by the spatial assignment of the objects upon re-entry, aslong as they re-enter the image within the motion range.

Furthermore, it is advantageous if at least one object is assigned to anobject class and the movement range is selected depending on the objectclass. The movement range can be dependent on the type of object or canbe set individually for the user. Different features can be stored in adatabase for different object classes, which are preferably searchedfor.

In the following, the present invention will be explained in more detailwith reference to a non-limiting embodiment variant shown in thefigures, wherein:

FIGS. 1.1 and 1.2 show the method according to the invention and adevice using the method according to the invention in a first embodimentin schematic views;

FIG. 2.1 shows the method according to the invention and a device usingthe method according to the invention in a second embodiment in aschematic view;

FIG. 2.2 shows the field of view of the second embodiment;

FIG. 2.3 shows the field of view of the second embodiment during themovement of an object.

FIG. 3 shows a block diagram of a tracking process in an embodimentaccording to the invention for focusing a camera;

FIG. 4 shows a block diagram of a tracking process in an alternativeembodiment according to the invention for focusing a camera.

In FIGS. 1.1 and 1.2 the method is explained in more detail using afirst embodiment.

2 persons A and B are standing in the field of view 3 of camera 1. Thesepersons represent objects that are to be traced, i.e. tracked. A 3Dsensor 2 is attached to the camera 1. In the 3D sensor 2, there is afurther camera 4 designed as a real image camera 4. In the video imageof this real image camera 4, the persons can be automatically tracked inan existing computing unit with the help of known tracking algorithms(for example a deep-learning algorithm, skeleton tracking, . . .) assoon as the process is started by the user.

For example, the field of view 3 of the camera 1 can be divided into 2segments - a left segment 5 and a right segment 6. This means thatperson A can be assigned to the left segment 5 and person B to the rightsegment 6. Since the real image of the 3D sensor 2 is stored with depthvalues, i.e. for each image area or pixel of the real image there isalso depth information, the distance of the persons can be determinedfrom the tracking points by the 3D sensor 2. For this purpose, theperson is summarized as a common object. This is represented by distanceDA for person A and distance D_(B) for person B. Person A is drawnfurther back and thus smaller. The left segment 5 is assigned to a firstposition of an operating element and the right segment 6 to a secondposition of the operating element. In this embodiment, the operatingelement is a slider and the first position is the left stop and thesecond position is the right stop. The corresponding distance D_(A) isassigned to the left stop of a slider 7, since person A is standing infront of the camera in the left segment. The distance D_(B) is assignedto the right stop of the slider. From the distances, the correspondingposition on the focus lens 8 can be calculated and set in a lens controlunit. The focus lens 8 is moved to the corresponding position andfocused. Distance D_(A) focuses on object A, distance D_(B) focuses onobject B. If the slider 7 is now moved from the left stop, whichcorresponds to object A, to the right stop (arrow 9), the focal planemoves from object A to object B in space. The movement of the sliderthus corresponds to a focus ramp, i.e. how fast the focus should movefrom object A to object B. If the objects A, B move in space, thetracking points follow them automatically, thus also the distances andit is still possible to move the focus from object A to object B at theslider without the user having to redefine the distances to the objects.With this method, the user can perform the timing and duration of thefocus shift very easily and intuitively, since he only has to move oneoperating element 7 and does not have to concentrate on the objectsthemselves.

The field of view 3 of the camera 1 could also be divided into a frontand rear segment, and thus object A or object B could be automaticallyassigned to a stop of a slider. If the assignment has been carried out,it remains in force even if the objects move.

It is also possible to set a maximum distance from which no objects willbe tracked. This makes it possible to track only objects in theforeground and ignore all objects in the background.

In FIGS. 2.1, 2.2 and 2.3 the method is explained in practice in asecond embodiment.

A person A representing an object is standing in the field of view 3 ofa camera 1. A 3D sensor 2 is attached to the camera 1. A real imagecamera 4 is located in the 3D sensor 2. In the video image of this realimage camera 4, the person can be automatically tracked in an existingcomputing unit using known tracking algorithms (for example, adeep-learning algorithm, skeleton tracking, face tracking, featuredetection,..) as soon as the process is started by the user. Since thereal image of the 3D sensor 2 is stored with depth values, i.e. for eachimage area or pixel of the real image a depth information is available,the distance of the persons A can be determined from the tracking pointsby the 3D sensor 2. The tracking algorithm could also run on the videoimage of the camera 1, if the image of the camera has been stored withdepth information. In this case, as described, the real image of the 3Dsensor 2 can first be stored with the depth values and then this can becombined with the image of the camera 1, or the depth values can bestored directly with the image of the camera 1. From the measureddistance, the corresponding position on the focus lens 8 can becalculated in a lens control unit. The focus lens 8 is moved to thecorresponding position and focused. If the person A moves in space, thetracking point automatically follows him, thus the distance and thefocus is automatically placed on the moving object.

From the distance D_(A) of person A, the tracking algorithm can now bemade more robust but also more efficient and faster. Depending on itsphysically possible speed, the person can only move within a certainmovement range in space at a certain sampling time, which preferablycorresponds to the recording time of the video image. The trackingalgorithm can thereby assign the object to an object class, in this casethe class “person”, and retrieve a maximum movement range 10 dependingon the class. Person A can have moved from one image to another onlywith L_(XA) in x-direction and L_(YA) in y-direction to the previousimage. Regions outside this segment are not possible. If the trackingalgorithm would position another position in the x/y plane of the imagein case of an error, this can be excluded. The tracking region of thecurrent image must be in the L_(XA) and L_(YA) segment.

The detection of the depth by the 3D sensor 2 proves to be particularlyadvantageous. Person A can only move in the z direction by L_(ZA). Anymovement outside this margin is physically impossible and can beexcluded.

The example in FIG. 2.3 shows the advantage. Person A is detected byface tracking. The face tracking is marked with F_(A). If a 2^(nd)person now enters the image, face tracking would also be performed here.However, this tracking position does not need to be considered becauseit is outside the possible region. If person B is closer to camera 1 andmoves through the image, person B covers person A partially orcompletely. Face tracking is no longer possible with coverage at personA. Even now, the tracking position does not jump to the face of personB, although the tracking position of face B would be the same or similarto the position of face A. In spatial direction, however, it would be adistance jump, which is not possible. Therefore, the position of thefocus does not change. In most cases this is not a problem, becauseperson A is covered in the image and therefore not visible. If person Breleases person A in the image again, face tracking FA is again possiblehere and a corresponding distance is determined. If person A has moved alittle in the meantime, the focus can jump to this position, or it isshifted to the corresponding position by a temporal ramp. The 3D sensor2 also determines the speed of the objects in space. If person A iscovered, the future position in space can be inferred from theirprevious speed and the focus can be moved further even if they arecovered.

If person A and person B are already in the field of view 3 at the startpoint, the user can alternate between the tracked face positions bysimply pressing a button and thus set the start position to face FA.

It is also possible to set a maximum distance from which no more peopleor objects are tracked. This makes it possible to track only objects inthe foreground and ignore all objects in the background.

Due to the existing depth image, it is possible to black out (fade out)all areas in the image that are not within the specified limit. Theregions where the tracking algorithm has to search for the target in theimage can thus be greatly restricted. It becomes more robust andefficient.

FIG. 3 shows a block diagram of a section of a possible object tracking.This represents a processing possibility in a processing logic fortracking objects.

In order to make the tracking algorithm more robust with depth data,there are 2 ways: Prediction and region determination.

FIG. 3 shows the process by region determination in a flowchart.

In this case, in a first step, an image of the camera 4 is read in(101). This can also be the image of camera 1 if the camera iscalibrated with the 3D sensor with respect to each other, i.e. the depthdata of the 3D sensor 2 can be matched with the image areas of camera 1with perspective accuracy. In the next step, the last validly calculatedposition of the tracked object A is adopted (102). The position of theobject A last calculated by the tracking algorithm (108) is transferredto this image (101). With the help of the allowed position change K(104) per sampling interval, the region can be calculated in which thenew position of the object is allowed (103). The permissible positionchange can be determined in a fixed manner depending on the object, orcan be entered by the user before the start point or changed during therunning process. Likewise, the depth image (105) is read in by the 3Dsensor 2. The last valid distance is adopted (106). The region in whichthe new distance of the object may be located (107) can be calculatedfrom the permitted position change K (104).

After this preparation, a new tracking process is started in the realimage (108). It can be any algorithm. In the example, a face of a personis searched for. A face tracking F_(A) is performed. The trackingalgorithm returns a new position X(n), Y(n) along the x- and y-axis ofthe image (109). A subsequent Kalman filtering is used to stabilize theposition (110). In the next step, it is checked whether the new positionis within the range L_(XA)(n) and L_(YA)(n), which indicate the limitsof the allowed region (113). If the new position (n) is outside theboundaries, no valid position determination has taken place (114). Thismeans that an object has been found in the image which cannot be theprevious object because the change in position has been above thephysically specified rate of change K. The position determination isdiscarded. The last valid position (n-1) remains valid.

Likewise, a new distance is determined (111) using the new position (n)(109) in the depth image. This distance is also stabilized with asubsequent Kalman filtering (112). Again, it is checked if the newcalculated distance is within the possible limits L_(DA)(n) (115). Ifthe change in distance is greater than would be allowed, the newposition determination is discarded again (116). The last valid position(n-1) remains valid. Only if the position is in the possible region(113) and in the allowed distance (115) a valid tracking has taken placeand this new position and the new distance are made available to thealgorithm for a further calculation step (117).

The new valid distance D_(A)(n) is used for the focus adjustment (118)and transmitted to the lens control system (119).

Tracking an object can also be made more robust by reducing the searchrange of a tracking algorithm through segmentation or regionalrestriction of the real image. Segments or regions where searching isnot allowed because they are physically impossible (because they areoutside the possible movement range of the object) can make thealgorithm faster, more efficient, and more robust. FIG. 4 shows theschematic structure of the algorithm.

In the first step, the real image of a camera 1 or of the 3D sensor 2 isread in (201). In the next step, the last valid calculated positionX_(A)(n-1), Y_(A)(n-1) of the tracked object A is taken over (202). Inaddition, the depth image is read out from the 3D sensor (203) and thelast valid distance D_(A)(n-1) is taken over (204). With the help of theallowed position change K per sampling interval (206), the distancerange L_(ZA)(n)=D_(A)(n-1) f K can be calculated, in which the objectmay be physically located (205). All points of the depth image which areoutside the allowed range can be set to zero or blacked out (207). Thisindicates the allowed segment or region. The allowed region of the depthimage can now be placed as an overlay over the real image. All regionsof the real image are hidden or blacked out themselves if they lie inareas of the depth image which are not allowed (208). Thus, theimpermissible regions in the real image are faded out and are no longervisually apparent. The real image has been reduced in size as a partialimage. Object tracking can now be performed in this image (209).Objects, e.g. faces, are now no longer visible in the real image if theyare located at impermissible distances from the camera. It is now onlypossible to track objects in a restricted partial image. The partialimage visually displays the physically possible regions of the originalreal image. If the desired object A is still present in the partialimage (210), a new tracking position X_(A)(n), Y_(A)(n) is calculatedfor object A (212). This position and the associated new distanceD_(A)(n) of object A now calculated from the depth image (213) can bepassed for the next image to be tracked. Ideally, the newly obtainedvalues are still run through a Kalman filtering. The new valid distanceD_(A)(n) is used for the focus adjustment and transmitted to the lenscontrol system (214) to adjust the focus of the camera accordingly. Ifno new valid distance was determined, the focus remains at its lastvalid position until a new valid value is calculated.

1. A method for focusing a camera including the steps of: dividing thefield of view of the camera into at least two segments; assigning, ineach case, at least one operating element or at least one position of anoperating element to the at least two segments; and recognizing andtracking at least one object in at least two segments; automaticallyassigning the recognized at least two objects to the respectiveoperating element or position of the operating element depending onwhich segment the objects are assigned to; and focusing the camera onthe object assigned to the operating element or the position of theoperating element in response to the operating element being actuated orthe operating element being brought into the corresponding position. 2.The method according to claim 1, characterized in that the automaticassignment of the at least two objects to one operating element each orposition of an operating element is maintained if at least one objectmoves to another segment to which it is not assigned.
 3. The methodaccording to claim 1, characterized in that the at least one operatingelement is a knob, rotary knob, joystick or slider having at least twopositions.
 4. The method according to claim 1, characterized in thatdepth data is recorded at least in a part of the field of view of thecamera and at least a part of the image components of the field of vieware assigned to these depth data, and before the assignment a furthercamera records a real image and at least a part of the image componentsof the real image are assigned to the depth data.
 5. The methodaccording to claim 4, characterized in that the at least two segmentsare at least partially delimited not only by side edges but also bydepth edges.
 6. The method according to claim 1, wherein the at leastone object is visually recognized and tracked in the field of view ofthe camera, characterized in that depth data is acquired at least in aportion of the field of view of the camera and assigned to at least aportion of the image components of the field of view, and in that atleast a portion of the depth data is used for recognizing and trackingthe at least one object.
 7. The method according to claim 1,characterized in that depth data is acquired at least in a portion ofthe field of view of the camera, and are associated with at least aportion of the image components of the field of view, and in that atleast a portion of the depth data is used for visual recognition andtracking of the at least one object.
 8. The method according to claim 1,characterized in that the at least one segment is defined in the fieldof view of the camera and is focused on a recognized and tracked objectas soon as it is located in the at least one segment.
 9. The methodaccording to claim 1, characterized in that the at least one object isassigned a maximum movement range per time unit and this movement rangeis included in the object recognition and object tracking.
 10. Themethod according to claim 9, characterized in that the at least oneobject is assigned to an object class and the movement range is selecteddepending on the object class.
 11. The method according to claim 1,characterized in that the object recognition comprises feature analysisin which visual features of an object in an area are searched, analyzedand identified, and the object is recognized based on these features andits movement is tracked.
 12. The method according to claim 1,characterized in that the object recognition comprises recognition via adeep-learning algorithm.
 13. The method according to claim 6,characterized in that the acquisition of the depth data is performed atleast in part via at least one 3D sensor attached to the camera.
 14. Themethod according to claim 4, characterized in that a further camerarecords a real image and at least some of the image components of thefield of view are assigned to the depth data.
 15. The method accordingto claim 7, characterized in that the acquisition of depth datacomprises triangulation of data from at least two auxiliary cameras.